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Abstract 

A two-point correlation function provides a crucial yet an incomplete characterization of a mi- 
crostructure because distinctly different microstructures may have the same correlation function. 
In an earlier Letter [Gommes, Jiao and Torquato, Phys. Rev. Lett. 108, 080601 (2012)], we 
addressed the microstructural degeneracy question: What is the number of microstructures com- 
patible with a specified correlation function? We computed this degeneracy, i.e., configurational 
entropy, in the framework of reconstruction methods, which enabled us to map the problem to the 
determination of ground-state degeneracies. Here, we provide a more comprehensive presentation 
of the methodology and analyses, as well as additional results. Since the configuration space of a 
reconstruction problem is a hypercube on which a Hamming distance is defined, we can calculate 
analytically the energy profile of any reconstruction problem, corresponding to the average energy 
of all microstructures at a given Hamming distance from a ground state. The steepness of the 
energy profile is a measure of the roughness of the energy landscape associated with the recon- 
struction problem, which can be used as a proxy for the ground-state degeneracy. The relationship 
between this roughness metric and the ground-state degeneracy is calibrated using a Monte Carlo 
algorithm for determining the ground-state degeneracy of a variety of microstructures, including 
realizations of hard disks and Poisson point processes at various densities as well as those with 
known degeneracies (e.g., single disks of various sizes and a particular crystalline microstructure) . 
We show that our results can be expressed in terms of the information content of the two-point 
correlation functions. From this perspective, the a priori condition for a reconstruction to be ac- 
curate is that the information content, expressed in bits, should be comparable to the number of 
pixels in the unknown microstructure. We provide a formula to calculate the information content 
of any two-point correlation function, which makes our results broadly applicable to any field in 
which correlation functions are employed. 
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I. INTRODUCTION 



Correlation functions are important structural descriptors that arise in a variety of dis- 
ciplines such as solid state physics [lj], signal processing ^J, computer vision [3J, statistical 
physicsjfl], geostatisticsjsj], and materials science js-8|. Many techniques for structural char- 
acterization of materials over a wide range of length scales provide data in the form of cor- 
relation functions including, but not limited to, scattering methods js), [lo|. Other examples 



are absorption spectroscopy 



ill, 



energy 



13j, and also grey-scale image analysis 



16 



transfer analysis 121, nuclear magnetic resonance 

~in 

141 1151 ] . Moreover, in the case of in situ studies 
19| . correlation functions are often the only data available 



with a nanometer resolution 
experimentally. 

Despite the widespread use of correlation functions, the nature of the structural infor 



mation they contain remains an open area of active research |2CH28j . The central question 
of the present paper can be put as follows: how accurately is a microstructure known when 
the only data available is a two-point correlation function? We shall focus our analysis on 
two-phase microstructures and the two-point correlation function 5*2 (r), which is defined to 
be the probability that two random points at a distance r from one another both belong to 
the same phase |7|. 

Two-point statistics are not sufficient to determine unambiguously a microstructure. The 
specification of a given two-point function 62 (r) is equivalent to defining a macrostate of the 
system, the degeneracy of which can be expressed as a configurational entropy. In particular, 
all space transformations that preserve distances - translation, rigid rotation and inversion 
- lead to microstructures with identical two-point statistics. Following previous work, we 
will call such degeneracies trivial j^fl . The focus of the present paper is on non-trivially 
degenerate microstructures, which cannot be obtained from each other through any of the 
aforementioned trivial transformations. 

Examples of non-trivially degenerate microstructures are Poisson polyhedra tesselations 
of three-dimensional space |3| and Debye random media 2l|, |29_|, |30| , which although having 



distinct microstructures, have identical 5*2 (r). Non-trivial degeneracy is not limited to in- 
finite systems. Examples of finite point patterns having the same two-point statistics have 
)een given as early as 1939, and Patterson coined the word "homometric" to qualify them 
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31] . Very recently, general equations have been derived that can in principle be solved 



analytically to obtain homometric microstructures 26|,|27|. In the context of crystallography, 
the degeneracy of the structural information contained in correlation functions is referred to 
as the phase problem. 

The phase problem, however, is not universally applicable. A spectacular counterexample 
is the so-called direct method of crystallography 32[ , for which Hauptman and Karle received 
the 1985 Nobel prize for chemistry. In the field of computer vision, it has been shown 
that finite textures are completely characterized by their orientation-dependent correlation 
functions 33J. Many theoretical examples of microstructures with a low degeneracy can 
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25 



35|. All these 



361 ] . The focus 



be accurately reconstructed from their correlation function alone 
examples have in common that they incorporate orientation information 
of the present work is on radial correlation functions in which orientation information is 
averaged out. This simplification is relevant to many experimental contexts, notably small- 
angle scattering; fl Q, where the correlation function is generally rotationally averaged 
through the measurement of powder scattering patterns, as well as to isotropic disordered 
systems in general 

The understanding of the structural information in radial correlation functions has been 
considerably advanced through the use of reconstruction algorithms, which aim at producing 



microstructures wit 
ergy functional 
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i a specified correlation function via the minimization of a prescribed en- 
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37- 



In the case of a reconstruction based on two-point correlation 



functions, a natural choice for the energy functional is 21 



3s 



E = [&(r) - S 2 (r) 



(1) 



where 5*2 (r) is the target two-point correlation function, 5*2 (r) is the correlation function of 
the microstructure, i.e., the configuration being optimized, and the sum is over all measurable 
distances. This definition of the energy is equivalent to a norm-2 error: it is non-negative 
and it vanishes only for those configurations that satisfy S^fV) = 5*2 (r). In this context, 
the question of the degeneracy associated with a given correlation function is equivalent 
to determining the number of microstructures having zero energy, i.e., the ground-state 
degeneracy of the energy functional 4(| . 

The minimization of Eq. (CQ) is generally done by discretizing the microstructure on a 



simulated annealing 



grid with periodic boundary conditions, and by using either a steepest descent 3^, |37j or a 



211 ] algorithm. In the case of a two-phase microstructure, which can be 



thought of as an image with black and white pixels, the simulated annealing reconstruction 
proceeds as follows. Starting from any configuration, with value Ei of the energy functional 
Eq. (0Q), a black pixel is chosen randomly and moved to any available white position. The 
function S^r) is updated and the new energy Ej is calculated. The move is accepted with 
probability 

V -min/l eMEl/T) } (2) 
where T is a temperature parameter 



exp 



41] . All energy-decreasing moves are therefore ac- 
cepted but some energy-increasing moves are accepted as well, depending on the chosen 
temperature. The latter moves are necessary to ensure that the entire configuration space 
be explored in principle, and that the system is not trapped in a local minimum of E. 
Simulated annealing algorithms consist in starting at a high temperature, and progressively 
decreasing the temperature until convergence is reached (E ~ 0). This type of approach 
las been widely used for microstructure reconstruction, in the context of both applications 
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45| and theoretical investigations 
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23 



35 



46[ . The latter include generalizations 



to other types of statistical microstructure descriptors besides S^r 



most notably to 



higher-order correlation functions 22|,|39| as well as to cluster correlation functions 39l. |47|. 



Examples of reconstructions of two-phase microstructures under periodic boundary con- 
ditions are given in Fig. [U In the case of the single disk, the reconstructed microstructure 
is almost identical to the target, except for a translation (top portion of Fig. [T|). In the 
case of the reconstruction of the hard disks (middle portion of Fig. [I]), the characteristic 
size of the disks as well as the average distance between them is recovered. However, an 
exact reconstruction of the target configuration is not possible; spurious objects are also 
formed through the partial merging of neighboring disks. In the case of a realization of a 
Poisson point process (randomly coloring a pixel black according to a prescribed volume 
fraction), the reconstructed and the target microstructures might look superficially similar 
because they both appear to be random distributions of black pixels (bottom portion of Fig. 
[1]) . However, the two microstructures have very little in common if one is interested in the 
exact configurations of the pixels, although an excellent match is obtained between S^r) 
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21, 



and 5*2 (r). This illustrates the concept of non-trivial degeneracy 

In a recent Letter 40J, we presented a general theoretical framework for estimating quan- 
titatively the structural degeneracy corresponding to any specified correlation function. 
This was achieved by mapping the problem to the estimation of a ground-state degener- 




FIG. 1. (Color online) From top to bottom: reconstructions of a single disk, hard disks, and the 
realization of a Poisson point process under periodic boundary conditions. In each case, the target 
(left) and reconstructed (right) microstructures are shown. The target (•) and reconstructed (— ) 
correlation functions are indistinguishable on the scale of the figure. The size of the grid is 32 x 32 
pixels with Ni = 200. These examples strongly suggest that the two-point function of a single 
sphere under periodic boundary conditions is only trivially degenerate through translation, but 
that the two-point degeneracy of a Poisson point process has a large non-trivial contribution. 

acy through the use of Eq. ([I]). Here we provide a more comprehensive presentation of 
the methodology and analyses, including a quantitative characterization of the energy land- 
scape associated with the reconstruction as well as a detailed derivation of the degeneracy 
metric. Moreover, we show that our results can be expressed in terms of the information 
content of the two-point correlation functions. Although the present work focuses on two- 
dimensional media in Euclidean space, our procedure can be applied in any space dimension 
and generalized to non- Euclidean spaces (e.g., compact and hyperbolic spaces). 

The remainder of the paper is organized as follows. In Sec. II, we discuss the degener- 
acy of the two-point statistics for a variety of microstructures that are used as benchmarks 
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throughout the rest of the paper. We consider successively small systems - for which all the 
configurations can be enumerated - intermediate systems - for which the degeneracy can be 
determined via a Monte Carlo method we presented elsewhere 40j - and large systems for 
which neither of the aforementioned two methods apply and one needs to use the recon- 
struction method. In Sec. Ill, we devise an analytical method, based on a random walk in 
configuration space, to characterize the energy landscape associated with reconstruction. In 
particular, we determine a characteristic energy profile for the basin of each ground state. In 
Sec. IV, we show that the ground-state degeneracy of reconstruction problems is related to 
the roughness of the energy landscape. We introduce a roughness metric that can be calcu- 
lated from 5*2 ( r ) alone, and we show definitively that it is correlated with the microstructure 
degeneracy. In Sec. V, the degeneracy is expressed in terms of the information content of 
5*2 (r), and a formula is proposed relating the roughness metric to this information content. 
The practical usefulness of our results is discussed. 



II. THE DEGENERACY OF TWO-POINT STATISTICS 
A. Small-system-size microstructures: countable examples 

The present paper is restricted to two-phase digitized microstructures, which can be 
thought of as images with Ni black pixels and N = N — Ni white pixels. However, our 
analysis can be easily generalized to multiphase microstructures. We shall first consider the 
very small microstructures of Fig. [2] with Ni = 4. They will be analyzed in some detail 
and will serve as a benchmark for analytical methods applicable to larger and more complex 
microstructures. 

For any finite microstructure it is always possible to refer to the pixels through a linear 
index, i — 1 to N, independently of the actual dimensionality. A finite microstructure is 
therefore completely characterized by a ^-dimensional vector, with components J(i) equal 
to 1 when point % is a black pixel, and otherwise. The two-point correlation function S^if) 
of the black phase is defined as the probability that two random pixels at a distance r from 
one another are both black p|. This can be written formally as 

1 N N 

^w = jv-EE J ( i ww< I i) , (3) 

r i=i j=i 




FIG. 2. Examples of small-system-size microstructures (under periodic boundary conditions) hav- 
ing different two-point degeneracies. From A to E, the degeneracies are = N, 2N, 8N, 12N, 
16 A, with N the total number of pixels in the grid (see text). Systems C to E have a non-trivial 
contribution to their degeneracy. 

where D r (i,j) takes the value 1 if the distance between pixels i and j is r, and otherwise. 
The quantity u r is denned as u r = Yli^Ahj)- I n Eq. the double sum counts the 
pairs of black pixels separated by a distance r, and the pre-factor normalizes that count by 
the total number of pixel pairs at a distance r from one another. The periodic boundary 
conditions are incorporated in the definition of the operator D r (i,j). We assume in the rest 
of the paper that the discretizing grid is uniform in the sense that u T is independent of j. 
The use of a discrete pixel grid is equivalent to a "quantizer" problem 48J , in which every 



point of the microstructure is quantized to the centroid of its closest pixel. The distances 
r between pairs of points are therefore approximated by distances that are compatible with 
the grid. A square grid is used throughout the present paper. For finite-size systems, the 



quantization naturally introduces some grid-specific artifacts [49[ . However, the quantization 
error decreases and becomes zero in the limit of infinitely large microstructures. 

The two-point correlation functions of the microstructures of Fig. [2] are given in Table 
[U under the form P(r) = Nu r S2(r)/2. The quantity P{r) is equal to the number of pairs 
of points at distance r from one another. Note that although configurations C\ and are 
different, they have identical two-point characteristics. The same applies to D\ and D2, as 
well as to Ei and E%. A complete enumeration of all microstructures with N\ = 4 shows 
that there is no other configuration with the same S2 (r). 

Configuration A in Fig. [2] is uniquely defined by its two-point function, and therefore is 
only trivially degenerate. On grids with N points the total number of translations is A^; the 
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TABLE I. Number of pairs P(r) = Noj r S 2 {r)/2 in microstructures A to E of Fig. [2j Note that 
P(r) is identical for configurations C\ and C2, Di and D2, E\ and E%. 
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number of rotations is 1, 2 or 4, depending on the rotational symmetry of the configuration; 
and the number of mirror configurations is 1 or 2, depending on its chirality. Due to the 
symmetry and chirality of configuration A, only translation contributes to its degeneracy, 
which is therefore Qq = N. In the case of configuration B, the two possible orientations 
contribute an extra factor 2, i.e. Qq = 2N. 



26 



Configurations C\ and C2 are the "Kite & Trapezoid" examples discussed in Refs. 20 



27| . which are non-trivially degenerate. In this case, Qq = 2 x 4 x N, where the factor 2 



is the non-trivial contribution, and the factor 4 accounts for the possible orientations. 

Configurations D\ and D 2 are also non-trivially degenerate. Configuration D 2 is, however, 
chiral so it has to be counted twice. This leads to f2 = (1 + 2) x 4 x N . Finally, non-trivially 
degenerate configurations E\ and E2 are both chiral. This leads to Qq = (2 + 2) x 4 x N . 



B. Intermediate-system-size microstructures: Monte Carlo analysis 



The complete enumeration of degenerate microstructures is intractable for systems even 
barely larger than those represented in Fig. [2j In the present section, we discuss a Monte 
Carlo (MC) algorithm for estimating Q , which we introduced previously 40j. It can be 
applied to larger systems. 

The approach is based on a general MC algorithm for estimating the density of states 



fDOS) 
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developed by Wang and Landau 50|, [Ml] and further improved and analyzed by others 



54j. The algorithm has been applied to a host of problems in condensed matter physics 



55|, in biophysics [56|, and in logic |57|. The DOS Q(E) is defined as the number of states 



having energy equal to E. The logarithm of Q(E) is equal to the entropy calculated in the 
microcanonical ensemble associated with Eq. ([Q). The ground-state degeneracy Qq is the 
value taken by Q(E) for E = 0. 

A canonical Monte Carlo simulation with transition probability given by Eq. (j2J) leads 



the system to visit any energy with a probability p(E) ~ exp(—E/T) 58[. The algorithm 



of Wang and Landau is based on the observation that a transition probability of the form 

Pi^j = min ( 1 , ^t^4 } (4) 



would lead the system to visit all energies with equal probability. The density of states fl(E) 
is, however, unknown so the algorithm is iterative. 

The starting value is set to Q(E) = 1 for all energies, and the system is let evolve 
according to Eq. (jlj), while updating an histogram H(E). Each time an energy is visited 
the corresponding bin is updated, H(E) — >• H(E) + 1, and the estimated density of states is 
updated according to Q(E) — > F x Q(E) where F is a numerical factor larger than 1. The 
evolution continues according to Eq. (jl]) with the updated value of Q(E). The evolution 
is stopped when a flat histogram is obtained. At this point, the histogram H(E) is reset 
to 0, F is reduced to a value closer to 1, and the evolution starts over again. The entire 



procedure is repeated until F becomes lower t 
are provided in the Supplementary Material 



ran a prescribed accuracy. Algorithmic details 



59|. 



The accuracy of the MC algorithm was tested by applying it to the microstructures of 
Fig. |2j The results are plotted in Fig. [3] in the form of cumulative DOS 

N n (E) = J2m- (5) 

e<E 

The MC algorithm provides Q(E) only to within an unknown multiplicative constant, which 
is determined by imposing Q(E) to be equal to the total number of configurations VL to t- 
The latter is equal to the number of different ways in which N\ black pixels can be chosen 
among a total of N possible pixels, i.e. 

n«=(T). (6) 



The cumulative DOS plotted in Fig. [3] satisfies N n (E ->■ oo) = Vt tot and Nq(E ->■ 0) = fV 
Three independent MC estimations have been calculated for each microstructure in Fig. 
[21 yielding three independent estimates of S7 - The results are: 66 ± 7 for configuration A 
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FIG. 3. (Color online) Cumulative DOS associated with the reconstruction of configurations A 
(□), C (o), and E (o) of Fig. EJ 

compared to the exact value 64; 140 ± 6 for configuration B compared to 128; 500 ± 68 
for configuration C compared to 512; 769 ± 18 for configuration D compared to 768; and 
991 ± 85 for configuration E compared to 1024. The exact values are those calculated in 
Sec. Ill Al with N = 64. The agreement with the MC estimates is excellent. 

Figure H] shows MC estimates of the density of states for larger microstructures, with 
iVi = 13 on a 8 x 8 grid. The microstructures are qualitatively similar to those of Fig. [I], 
namely a single disk, hard disks, and a Poisson point process, all under periodic boundary 
conditions. In the case of a single disk, the MC estimation provides the value f2o = 58 ± 8, 
corresponding to the 64 possible translations. This confirms that the disk is only trivially 
degenerate. By contrast, the value found for the Poisson point process is Qo = (H i 1) 10 6 , 
which is orders of magnitude larger than any possible trivial contribution from translation 
and rotation. In the case of the hard disks, we find Qq = (23 ± 4) 10 3 . 

C. Large-system-size microstructures 

The MC algorithm does not converge for systems larger than about 10 x 10 pixels. With 
larger systems the criterion for flat histograms is rarely reached, even with as many as 10 9 MC 
steps. Moreover, when flat histograms are indeed obtained, the estimated value of Qo is much 
smaller than 1, which shows that the algorithm explores only a small fraction of the complete 
configuration space. These numerical difficulties are consistent with previous observations 
that flat-histogram algorithms have a convergence time that increases exponentially with 
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FIG. 4. (Color online) Top, from left to right: single disk, hard disks, and Poisson point process 
realization; Bottom: cumulative DOS associated with their reconstruction from S2 (r) in the case 
of the disk (□), hard disks (o), and the Poisson point process(o). 



system size [52]. 



It is therefore difficult to estimate the ^-degeneracy of systems as large as the one shown 
in Fig. [T], except in the particular case where the microstructure is only trivially degenerate. 
It has to be stressed that reconstructing exactly a degenerate microstructure is very unlikely. 
Therefore, whenever a reconstruction leads to a translated, rotated, and inverterted version 
of the target, this can be considered as very strong evidence that the microstructure is only 
trivially degenerate. In the remainder of the paper, we shall refer to a microstructure as 
being non- degenerate, whenever it has only a trivial degeneracy. 

In continuous Euclidean space under periodic boundary conditions, an example of non- 
degenerate microstructure is provided by the single sphere (composed of a large number 
of pixels). This results from the observation that 5*2(0) is equal to volume fraction of the 
solid phase and that the negative slope of 5*2 (r) for r — y is proportional to its surface area 



291 ] . A sphere is non-degenerate because it is the microstructure that realizes the lowest 
possible surface area for a given volume fraction: the two-point correlation function of any 
microstructure other than a single sphere would have a larger slope at the origin, which 
would result in a positive energy according to Eq. (JI|). 

This observation can be expressed in a way that generalizes to discrete microstructures: 
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20 40 60 

r (pixels) 

FIG. 5. (Color online) From top to bottom: reconstructions of a crystal and of two polycrystals with 
decreasing crystallite sizes. In each case, the target (left) and reconstructed (right) microstructures 
are shown. The target (•) and reconstructed (— ) correlation functions are indistinguishable on the 
scale of the figure. The size of the grid is 128 x 128 pixels under periodic boundary conditions, and 
iVi = 3000. 

for a given number of black pixels N±, a single sphere is non-degenerate because it is the 
microstructure that realizes the largest value of S% (e), where e is a very small distance. 
Similarly, any configuration with N\ = 13 other than the disk of Fig. H]has a smaller value 
of P(\/2), where it is to be recalled that P(r) is the number of pairs of points with distance 
r. The same applies to configuration A of Fig. |2j which is not a disk: that particular 
microstructure is non-degenerate because any other configuration with N% — 4 has a smaller 
value of -P(3). The origin of the degeneracy of hard-disk systems is touched on in Sec. VI. 

The analysis of non-degeneracy in terms of extremal values of P(r) leads to non-intuitive 
results. When microstructures are denned on a grid, distances and orientations are not 
independent: for instance, a pair of points at a distance a/8 from one another is necessarily 
oriented at 45° with respect to both axes. A very anisotropic microstructure such as the 
crystal on the top of Fig. |5]minimizes P(r) for a set of distances corresponding to orientations 
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orthogonal to the stripes. The figure clearly shows that S^r) vanishes for a set of well-defined 
distances. It should therefore not be surprising that such a highly anisotropic microstructure 
is non-degenerate. The non- degeneracy of the crystal is confirmed by the fact that the 
reconstructed microstructure in Fig. [5] is a translated and rotated copy of the target. The 
vertical discontinuity in the middle of the reconstruction results simply from the target not 
having the same periodicity as the box. 

When a large crystal in a periodic box is split into a collection of randomly oriented 
smaller crystallites (Fig. [5] middle and bottom rows), its anisotropy is reduced and there are 
no longer values of r at which P(r) is extremal. Accordingly the reconstruction becomes 
less accurate, which means that the microstructure becomes more degenerate. A more 
quantitative analysis of this issue is provided in Sec. V. 



III. CHARACTERIZATION OF THE ENERGY LANDSCAPE 



A. The Structure of Configuration Space C 

The complete configuration space C of two-phase microstructures with N pixels is the 
set of vertices of an ^-dimensional hypercube 40J. This results from the properties of 
the indicator vector, which can take only values and 1. Moving along a given N- 
dimensional direction (along an edge of the hypercube) is equivalent to interchanging a white 
(black) with a black (white) pixel. In the example of Fig. El any movement along the fourth 
dimension (joining the outer and inner cubes) corresponds to changing the upper-left pixel. 

In the situation relevant to reconstruction, not all the vertices of the hypercube are 
accessible because the number of black pixels is kept constant, i.e. 

N 

£ J = tfi, (7) 

i=l 

which means that all realizable microstructures lie on the intersection of the hypercube with 
a hyperplane. Once a target correlation function ^(r) is specified, each vertex is assigned 
an energy through Eq. (CQ). 

What we refer to as the energy landscape is the set of values taken by the energy functional 
E on the vertices of the N- dimensional hyperplane. A reconstruction consists in exploring 
the energy landscape until a vertex is found with E — 0. The DOS Q(E) determined in 
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FIG. 6. (Color online) The configuration space C of a two-phase microstructure is an iV-dimensional 
hypercube on which Hamming distance can be defined. Any move along a iV-dimensional direction 
corresponds to changing the color of a particular pixel. In the case of a 2 x 2 microstructure the 
configuration space is a tesseract, with the fourth dimension represented as the edges joining the 
outer and inner cubes (corresponding to the upper- left pixel). 



section III Bl is the number of vertices having a given energy E. The problem we address in 
this section is that of the spatial variability of E in configuration space C. This analysis 
is motivated by the observation, in many fields of physics, that systems with large ground- 



state degeneracies generally have a rough energy landscape [60|, [6l| . If we can characterize 
the roughness of the energy landscape in terms of 5*2 (r) we can estimate the ground-state 
degeneracy VLq. 

In order to characterize the spatial variability of E in configuration space, it is necessary 
to define a distance. A natural choice is the Hamming distance, which counts the number of 
edges between any two vertices. The Hamming distance within the hyperplane defined by 
Eq. (ED) takes only even values. The distance d[A, B] between two microstructures A and B 
is therefore defined as half the Hamming distance 



i N 

d[A,B]^-J2\Ui)-lB(i) 



(8) 



i=l 



where Ia{^) an d Ib^) are the indicator vectors. In real space, this distance d is the smallest 
number of Monte-Carlo-like black pixel displacements that are required to pass from A to 
B. The largest possible distance is d = N\ when the two microstructures have no pixel in 



common 



62) 
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FIG. 7. (Color online) Random walks in the configuration space C of a single (discretized) disk, 
hard disks, and a Poisson point process (left to right), all with Ni = 200 pixels under periodic 
boundary conditions. The irregular curves are the energies visited during particular realizations 
of the random walk, and the thick black line is the average value calculated analytically through 
Eq. (I16j) . The microstructures shown on top are the starting ground states and the configurations 
reached after n = 80 and n = 160 random moves. 

B. Exploring the Energy Landscape with a Random Walk 

The energy landscape can be characterized analytically through a random walk in config- 
uration space. This is illustrated in Fig. [7J Starting from a ground state of a reconstruction 
problem, with 5*2 ( r ) = 5 , 2( r ) ) the system moves randomly to any configuration at Hamming 
distance d = 1 from the current configuration. This is equivalent to a standard Metropolis 
Monte Carlo with T — > oo [see Eq. (j2j)]. When the number of moves n increases, the random 
walk explores the configuration space C over increasingly large Hamming distances d from 
the starting ground state. 

The rate at which the average energy (E)^ visited by the random walk increases with 
the number n of moves characterizes the energy landscape of a given reconstruction problem. 
In the examples of Fig. [7J the energy curve of the Poisson point process is steeper than that 
of the single disk, which suggests smaller basins. We now proceed to analytically calculate 
the values of (Ef' as a function of the characteristics of the starting ground state. 

The only a priori information about the ground states of a given reconstruction problem 
is their one-point and two-point characteristics: <fi = N±/N and S>2 (r). The other charac- 
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teristics, in particular the higher-order correlation functions, may differ significantly from 
one ground state to another. Let us assume for now that the starting ground state of the 
random walk is perfectly known through its indicator vector 

Instead of using ^(r), it is convenient to use the equivalent autocovariance x{ r ) defined 

as 

X(r) = S 2 (r) - 2 , (9) 

where = Ni/N is the fraction of black pixels. The average energy after n random moves 
can be written in terms of x{ r ) as 

(E) in) =£f(r) + <X 2 (r)) (n) -2x(r)( X (r)) (n) , (10) 

r 

where we have used the notations x{r) = 5*2 (r) — 2 , which is associated with the ground 
state, and (.)^ for any average value at step n. At each step of the random walk there are 
NqN\ possible moves, so that the total number of possible walks of length n is (N Ni) n : the 
averages < . >^ are calculated over all these possible paths. We now calculate successively 
(x( r ))^ an d (x 2 ( r )) ( ™\ which are required to calculate and (E)^ through Eq. fTTUj) . 

When a black pixel p is moved to position q previously occupied by a white pixel, the 
indicator vector becomes 

I'(i) = I(i) + 6(i,q)-6(i,p) , (11) 

where S(i,q) is the Kronecker delta function. Using the definition of ^(r), Eq. ([3]), the 
autocovariance is then found to become 

/ 2 f 

X (r;p,q) = x(r) + ]y^\ 5 ( r ' ) _ D r\P,Q) 

+ Y t I(i)[D T (q,i)-D T fai)]} (12) 

i 

for that particular move. The average value of x'{ r ]Pi(l) is then simply calculated as 

(X(r)) = jjL- 1(p) (1 - I(q)) X '(r; p, q) , (13) 
1 v i 

where the factor I(p) (1 — I(q)) /(NqNi) accounts for the fact that p and q are uniformly 
distributed over the black and white phases, respectively. 

Combining Eqs. f ll2p and ( I13p . the average autocovariance x{ r ) after a single random 
move is found to be 

( X '(r))=ax(r) + 0(N- 2 ) (14) 
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with a = 1 — 2N / (NqNi). In Eq. ( IT4|) a term of the order of N~ 2 has been neglected, which 
is justified for large values of N. The complete equation can be found in the Supplementary 
Material 59(. Equation (1141) is valid only for r > 0. The value for r = depends only on the 
fraction of black pixels 0, x(0) = 0(1 — <p), and it is therefore a constant during the random 
walk. 

Because each random move is independent from the previous one, the analysis leading 
to Eq. ffl~4"l) can be repeated in an iterative way. Taking into account that the starting 



two-point function is x(r), this leads to the following simple result 

(X(r)) (n) = X(r) a n . (15) 

In the course of the random walk, the average two-point function therefore converges towards 
that of a Poisson point process with x( r ) = f° r all r > 0. The convergence is 

exponential in the number of random moves and it occurs in about N Ni/(2N) moves. 
The determination of (x 2 (r)) proceeds along the same lines, but it is more involved; 



the details can be found in the Supplementary Material [59J . When the expression obtained 
for (x 2 ( r )) is introduced in Eq. (fTUl) . the value of the average energy takes eventually the 
form 

(E) {n) = E 00 + {E 2 -E 1 )a n + E 3 /3 n 

+ (E 1 -E 2 -E 3 -E OQ ) 1 n , (16) 

where Ei, E 2 , E% and -E^ are constants that characterize the starting ground state. The 
constants /3 and 7 depend only on N and N\ 

and a has the same meaning as in Eq. (fT4|) . 

The constants E\ and E^ depend only on two-point characteristics of the ground states. 
They are written as 

E 1 = 2j2x\r) (18) 



and 



\2 



^ = Ef(r ,4^, (19) 



where the sum is over all the distances that are used in the definition of the energy. As a 
consequence of Eq. (TlBI) . is the value towards which (E)^ n ' converges for large values of 
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n. Since the random walk is ergodic, any configuration has the same probability of being 
visited in the long run. Therefore, is the average energy calculated over the entire 
configuration space, which we refer to hereafter simply as (E). 

The main contribution to (E) is X 2 i which is small for disordered microstructures. This 
term vanishes in the extreme case of a Poisson point process for which the only contribution 
left is of order 1/N, according to Eq. (|T9|) . The shifting of the average energy towards lower 
values with increasing disorder is clear in Figs. H] and [71 

The other two constants, E% and £3, in the expression of (E) {n) depend on more than 
the two-point function of the ground state. They are given by 



:i - 2 * )2 x(r) (20) 



and 



£3 = ^(1-20)0^ 



c {r) -a {r) + — X [r 
20) 2 



X(r) , (21) 



where the functions <r 2 (r) and cr^(r) are defined as 



^(r)-^l2{-l2 I ( i ) D r(hs)\ -0 2 (22) 



and 



s ^ i 

^, (s) jl^ ( , 1D , 1 , !) )'_(^) ! . (23) 

We postpone to Sec. II VI the detailed discussion of the structural meaning of er 2 (r) and cx^(r). 
We should only mention here that cr 2 (r) can be expressed in terms of Sp(r) and that it is 
therefore common to all ground states (see the Supplementary Material 59j). By contrast, 
<Jc(r) depends on three-point statistics and may differ from one ground state to another. 

The black lines in Fig. [7J have been obtained from Eq. (TL6l) with the constants E\ to E^ 
evaluated at the starting ground state. The analysis captures the essential features of the 
random walk, in particular the steepness of the (E)^ versus n curves. 

An important quantity for the rest of the analysis is the average energy of all configu- 
rations at Hamming distance d = 1 from the ground state. Setting n — 1 in Eq. f|T6|) and 
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neglecting terms of the order of N 3 , leads to 




AN 



J2\x\r) + ^ a 2 (r) + (l-20)0 2 a 2 (r) 



(24) 



o 



The first two contributions are global characteristics of the configuration space, which depend 
only on £2 (r) and are therefore common to all ground states. By contrast, the contribution 
from cr^,(r) may a priori differ significantly from one ground state to another. We discuss 
this point in detail in Sec. HVl 

C. The Energy Profiles of Individual Basins 

The average energies (E)^ visited after exactly n random moves is strictly a property 
of the random walk, not only of the energy landscape. The aim of the present section is 
to use Eq. ffl6|) to calculate the average energy of all microstructures at a given Hamming 
distance d from a given ground state. 

A random walk of length n can reach any microstructure at Hamming distance d < n 
from the ground state. Let us call a c/-state a microstructure at distance d from the ground 
state, and v n (d) the fraction of all the random walks of length n that end on a <i-state. The 
average energy E(d) of all (i-states is related to (E)^ through 



If the values of v n (d) were known, this relation could in principle be inverted to estimate 
E(d) starting from Eq. (fl6|) . 

The distribution u n (d) can be obtained by noticing that the random walk is a Markov 
process, and by calculating the transition probabilities between various d-states. In real 
space, the Hamming distance is the minimum number of pixel displacements that is necessary 
to make the state identical to the ground state. Therefore, in a z-state, the black phase is 
identical to the ground state but for i holes, and the white phase is identical to the ground 
state but for i extra pixels. Starting from an z-state at step n, there are i 2 ways to reach a 
(i — l)-state at step n + 1. These correspond to the number of different ways to take one of 
the i extra pixels and place it into one of the i holes. The transition probability is therefore 



n 




(25) 



d=0 



(26) 
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FIG. 8. (Color online) Distribution of Hamming distances to the ground state v n (d) for increasing 
number n of steps in the random walk. The figure is for N = 1024 and N\ = 200, relevant to Fig. 

m 

where the denominator is the total number of possible moves. There are i(N — 2i) different 
ways to reach another i-state. These correspond to the i(Ni — i) different ways of moving 
the holes within the black phase, plus the i(N — i) different ways of moving the extra pixels 
within the white phase. The transition probability is therefore 



Pi 



i{N-2i)/(N N 1 ) 



(27) 



Finally, there are (No — i)(Ni —i) different ways of reaching a (i + l)-state, which correspond 
to the different ways of taking a black pixel and putting it in the white phase. This leads to 



Pi-Ki+D = {No - i)(N! - i)/{N N!) . 



(28) 



The three transition probabilities add up to 1, which proves that all possibilities have been 
considered. 

The enumerated probabilities define a tridiagonal transition matrix A of size (iVi + 1) x 



(iVi + 1), wit 



Material 



i elements A(i,j) = Pj^i- The explicit form of A is given in the Supplementary 
59|. Writing the values u n (d) in a vector as y_ n = [^„(0), v n (l), . . . , z/ n ,(iVi)] T 



enables us to write y_ n+l = A y_ n . The general solution is therefore 



AV 



(29) 



where y_ = [1, 0, . . . , 0] T is the trivial distribution of Hamming distances in the ground 
state. 
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FIG. 9. (Color online) Energy profiles of the microstructures shown in Fig. [2j From left to right: 
A, B, d (o) and C 2 (+), D 1 (o) and D 2 (+), E\ (o) and E 2 (+). Note that the profiles of D x and 
D 2 , as well as E\ and £2 are identical. The solid line is the approximate profile, common to all 
ground states, calculated from S 2 (r) alone using Eq. ([36]) . 

The particular evolution of v n (d) for N = 1024 and N\ = 200 is shown in Fig. [HJ These 
values are relevant to Fig. [7J For small values of n, the distribution is centered on the value 
d = n. For large values of n, however, u n (d) converges towards an equilibrium distribution. 
It is useful to note that although all states are accessible to the random walk after n — N\ 
moves, the energy (E)^ continues to changes for larger values of n (see Fig. [7J because the 
convergence of v n (d) is asymptotic. 

Using the known values of v n (d) and of (E)^ n \ Eq. ([25]) can be inverted for n — 1, 2, iV\, 
yielding the values of E(d). The procedure is illustrated in Fig. [9] for the small-system-size 
configurations of Fig. [2j The results are plotted in the normalized form E(d)/ (E), which 
we refer to as the energy profiles. 

The energy profiles describe quantitatively the average energy landscape surrounding any 
particular ground state. They are all initially increasing curves that start from and reach 
values close to 1. The average energy for d = 1 is equal to (E}^\ as calculated from Eq. 
( [24]) . For large Hamming distances the energy decreases again because microstructures with 
large d can be thought of as negative imprints of the ground state: for d — N% the points 
that were occupied by black pixels in the ground state are all occupied by white pixels. 

Figure \IU\ shows the energy profiles of the single disk, the hard disks, and the Poisson 
point process of Fig. [H When plotted on logarithmic scales (insets), the profiles are seen to 
satisfy a power law of the type 

E(d) = E(l)d s (30) 
for small values of d. When the resolution is increased - i.e. increasing N from 8 2 to N = 32 2 
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FIG. 10. (Color online.) From left to right: energy profiles of the single disk, hard-disks, and 
Poisson microstructures shown in Fig. 01 The symbols (o) are the exact values and the solid 
lines are the approximate profile, common to all ground states, calculated from S^fV) alone using 
Eq. (|36p . The exact values are plotted in insets on logarithmic scale, together with the profiles 
of equivalent systems of larger sizes, namely with L = 32 (top) and L = 64 (bottom). The 
microstructures used for the L = 32 profiles are those of Fig. [TJ The approximate and exact 
profiles are indistinguishable on the scale of the insets. 



and N = 64 2 while keeping N±/N constant - the profiles are shifted vertically (insets) but 
the exponent of the power law persists. In the case of the reconstruction of the single disk, 
the exponent 5 is close to 2 1.97), and for the reconstruction of the Poisson point process 
the exponent is close to 1 (~ 0.92). The energy profile of the hard disks is not a pure power 
law: the exponent is that of a single disk for large Hamming distances and that of a Poisson 
point process for shorter distances. 

The different exponents 5 for the Poisson point process and for the single disk hint at a 
qualitative difference that can be understood with the hole and extra pixel interpretation of 
Hamming distance d. In the case of the Poisson point process, the energy is proportional 
to d. This means that any hole added to the ground state contributes additively to the 
energy, which points to the absence of effective pixel-pixel interaction energy. By contrast, 
the quadratic behavior for the single disk points to a collective contribution of the pixels to 
the overall energy, which can be considered as the signature of a structure. 
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IV. ENERGY ROUGHNESS AS A PROXY FOR GROUND-STATE DEGENER- 
ACY 



When comparing the energy profiles in Figs. M and [TU] with the densities of states in Figs. 
[3] and HI a striking correlation appears between the sizes of the basins and the ground-state 
degeneracy Qq. We observe that the smaller the basin, the more degenerate the reconstruc- 
tion. This observation is consistent with the one that large ground-state degeneracies are 



60 



61] 



generally associated with rough energy landscapes 

However, a major difference between the energy profiles and the ground-state degeneracy 
is that the latter is a global characteristic of the reconstruction problem but the former are 
specific to given ground states. For example, in the case of configuration C of Fig. [2J the 
ground states C\ (the Kite) and C2 (the Trapezoid) have slightly different energy profiles 
(Fig. [9]). The main purpose of the present section is to provide an approximation for the 
energy profile, common to all ground states, which can be calculated from ^(r) alone. 

To understand how the energy profiles depend on the particular ground state, it is nec- 
essary to analyze the structural meaning of functions a 2 (r) and cr^-(r) defined by Eqs. (1221) 



and (123]) . We show in the Supplementary Material 59( that cr 2 (r) can be expressed in terms 
of the two-point function x( r ) as 

o 2 (r) = ^2v(r,l)u lX (l) , (31) 

l<2r 

where v(r,l) is a characteristic of the grid and of the boundary conditions. By definition, 
all ground states have identical two-point statistics. The contribution of a 2 (r) to the energy 
profile is therefore common to all ground states. By contrast, it results from Eq. (123]) that 
(Jc(r) is a sum of terms of the type 

I(s)I(i)I(j)D r (s,i)D r (s,j) , (32) 

which incorporate three-point statistics. Accordingly, the contribution of cx^,(r) to the energy 
profile may differ significantly from one ground state to another. 

As a consequence of Eq. (I32p . the pixel configurations that contribute to <7^.(r) are 
isosceles triangles with apex s and side- length r. In the case of configuration C of Fig. [2, 
the Kite (C\) has two such triangles, one with r = 1 and the other with r = \/5. On the 
contrary there is no isosceles triangle in the Trapezoid (C2). It therefore results from Eq. 
(|24p that (E)^ is larger for C\ than for C2, in agreement with Fig. [9j 
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The energy profiles of D\ and D2 are identical although these two ground states are 
different. This originates in the fact that in both ground states, there is an isosceles triangle 
of side r = 2 and another one of side r = \/l0. The same explanation applies to E\ and E 2 . 
In both ground states there is a single isosceles triangle with r = \^E. 

Despite these differences between a 2 (r) and o"^.(r), the two functions have a strong sim- 
ilarity which can be put in evidence by the following probabilistic interpretation. Consider 
the set of all pixels at distance r from a given pixel s, which we refer to as the ring centered 
on s. The fraction of black pixels in the ring can be written as 



N 

1 - 



1 N 

-J2mD r (i,s) • (33) 



1=1 



When s is chosen randomly among all pixels (black and white) ip r is a random variable 
having average value <p. Equation (122]) shows that a 2 (r) is the variance of <p r . The function 



<j 2 (r) can therefore be thought of as a generalized coarseness 63]. 

From this probabilistic perspective, the meaning of o^(r) is equivalent to <r 2 (r), only the 
central pixel s is not distributed randomly over the entire space but only over the black 
pixels. In this case, the average of cp r is the conditional probability that a pixel of the ring 
is black, given that the central pixel is black too, i.e. ^(r)/^. From Eq. (J23l) . one sees that 
cr^(r) is the variance of ip r when the central pixel s is randomly distributed over the black 
phase. The function cr^(r) can therefore be thought of as a conditional coarseness. 

For small radii r, the values taken by I(i) on the ring are highly correlated with the value 
in the center, which implies o^ir — > 0) = 0. For large radii, the values on the ring and in 
the center are not correlated at all and therefore cr^(r) ~ cr 2 (r). An example of a 2 (r) and 
<r^(r), calculated on the realization of a hard-disk system is given in Fig. [TTJ 

The similarity of the probabilistic interpretations of ^(r) and of c 2 (r), and their strict 
mathematical equality for large values of r, suggest that it should be possible to find an 
approximation of cr^(r) in terms of two-point functions only. This would enable us to 
estimate a single approximate energy profile that would depend only on ^(r). That profile 
would therefore be common to all ground states of a given reconstruction problem. 

To find such an approximation, we observe that the terms between braces in the definitions 
of <Jc{r) and cr 2 (r) are identical and that they are all positive (see Eqs (122]) and (123]) ). 
However, there are fewer terms in Eq. ( 123]) because I(s) can be equal to 0. One has 
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FIG. 11. (Color online) Functions <r 2 (r) (top) and <r^(r) (bottom) of a hard-disk microstructure. 
The inset of the bottom graph shows c^(r) and its upper bound afj(r). The dashed red line is the 
approximation o^(r) obtained through Eq. (|36|) . 

therefore 

+ <iV(^(r) + 2 ) , (34) 

which leads to the following upper bound for cr^(r): 

4W<4M = ^ 2 M + ^-^, (35) 

which depends only on two-point statistics. 

The inset of Fig. [TT] compares cr^(r) to crj)(r) in the particular case of a hard-disk 
microstructure. The upper bound <jfj(r) is a good approximation of c^(r) only for very 
small r. We therefore propose the following approximation for cr^(r) 

^(0 = f ^rr + h^t) 1 , (36) 

which is practically equal to cr 2 (r) for large r, as it should. Fig. [Til shows that 5"^( r ) ^ s a 
fair approximation of <Jc(r) at all radii. 

Using o^{r) in place of cr^(r) enables us to calculate a single energy profile, based on 
5*2 ( r ) alone. The red curves in Figs. 151 and [101 have been calculated in that way: cr 2 (r) was 
calculated rigorously from the target S 2 (r) through Eq. (IHTj) . and cr^(r) was approximated 
by Eq. (1361) . In the case of the larger microstructure shown in the insets of Fig. [TUJ, the 
approximate profiles are indistinguishable from the exact profiles on the scale of the figure. 
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FIG. 12. (Color online) Relation between ground-state degeneracy fio/^tot an d roughness of the 
energy landscape E(l)/ (E) calculated from S^r) alone. The various microstructures are: disks 
of different sizes (•), realizations of Poisson point processes of various densities (o), hard-disk 
microstructures (x), the configurations of Fig. [2] with N\ = 4 (□), as well as a configuration with 
Ni = 2 (*). The black line is a guide to the eye; the insets are sketches of archetypical energy 
landscapes for large and small values of E{1)/ (E). 

Using Eq. ( 136]) it is possible to estimate a single metric to characterize globally the 
roughness of the energy landscape. We propose the ratio 



E(1)/{E) 



(37) 



where the tilde highlights the fact that E(l) is estimated through the approximation &^(r). 
The quantity E(l) is the average energy of all microstructures at distance d = 1 from 
the ground state. Because the ground states have zero energy, -E'(l) can be thought of as a 
Laplacian in configuration space C. The ratio E(l)/ (E) is therefore a dimensionless measure 
of the total curvature of the energy surface in the vicinity of any ground state. It has to be 
stressed that £"(1)/ (E) is calculated from S^r) alone, and that it is therefore not specific 
to any particular ground state. 

Figure H2] shows the quantitative relation between E(l)/ (E) and the normalized ground- 
state degeneracy Qo/Q t ot for a variety of microstructures denned on a 8 x 8 grid. The 
microstructures used for the figure are available in the Supplementary Material 59|: they 
comprise both non-degenerate disk-like objects and highly degenerate realizations of Pois- 
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son point processes. The ground-state degeneracy of the latter was estimated via the MC 
algorithm. The quantity Qo/Q t ot is found to be highly correlated with E(1)/Eoq over more 
than 14 orders of magnitude. 

When passing from small to large values of E(l)/ (E), the energy landscape changes 
qualitatively in the way suggested by the insets of Fig. [12] For low values of E(l)/ (E), 
the energy landscape has an overall funnel structure, with low energy barriers, which makes 
it well suited for optimization problems. By contrast, for large values of E(l)/ (E), the 
landscape is very rough with a large ground-state degeneracy. It is, however, interesting to 
note that the rightmost point in Fig. [12] is obtained for a system with Ni = 2 having thus 
only a trivial degeneracy. The corresponding energy landscape is extremely rough because 
any possible energy can be found at a Hamming distance as short as d = 1 from the ground 
state, but the total number of configurations Vt tot is also extremely small. 

The data referred to as disks in Fig. [T2] is a collection of non-degenerate microstructures 
with increasing values of N\. When increasing N±, the roughness E(l)/ {E) decreases but 
the degeneracy remains equal to its trivial translation contribution Qq = N. It is noteworthy 
that the values of Qo/Q t ot of these non-degenerate microstructures span the same curve as 
the realizations of Poisson point processes, for which Q has a huge non-trivial contribution. 

The microstructures considered in Fig. [12] were limited to 8 x 8 grids, which size limit 
is imposed by the convergence of the MC algorithm. However, the fact that the Qo/Qtot- 
versus-E(l)/ (E) curve does not discriminate trivial from non-trivial degeneracy should not 
be limited to small microstructures. This assumption enables us to extend the curve to 
larger microstructures by using disks of increasing sizes N\, on grids with increasing sizes 
N, for which the degeneracy is known to be exactly Qq = N. This was done in Fig. The 
degeneracy is plotted in the form of A/52 = Iog 2 (f2 tot /f2o) for reasons that will be explained 
in next section. The inset of the figure shows that disks of all sizes and on all grids span a 
single curve which obeys approximately a power law. 



V. DEGENERACY ANALYSIS USING AN INFORMATION-THEORETIC FOR- 
MULATION 

The degeneracy f2 can be analyzed in terms of the information content associated with 
a given two-point function 5*2 (r). Indeed, if a reconstruction problem is non-degenerate, 
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FIG. 13. Relation between the amount of structural information A/52 in the two-point function 
and the roughness metric E(l)/ (E). The relation was obtained from disks of increasing sizes N± 
defined on grids of size N = 8 x 8 (•), 16 x 16 (+), 32 x 32 (o), 64 x 64 (□) and 128 x 128 (*). The 
inset shows the data on logarithmic scales. The solid line is Eq. (|4ip obtained by least-square fit. 

the two-point correlation function is a complete characterization of the microstructure. By 
contrast, in the case of a large degeneracy the correlation function contains a relatively small 
amount of microstructural information. 

This idea can be made quantitative by borrowing concepts from information theory 



64. 



65| . In that context, a given microstructure is considered to be the outcome of a random 
process. More specifically, if nothing is known other than the total number of pixels N, then 
specifying a given microstructure is equivalent to drawing it out of the complete configuration 
space. Any microstructure is therefore an event having probability p = 1/2 N . The self- 
information (or so-called surprisat) associated with such an event is 

/=-log 2 (p)=JV. (38) 

The use of a base 2 logarithm ensures that the self-information is expressed in units of bits. 
The self-information can be used as a quantitative measure for the information content of 
the realization of an event. In this particular case, the value is / = N bits, i.e., 1 bit per 
pixel, which is quite consistent. 

If the number of black pixels N\ is known, a microstructure is no longer a draw out of 
2^, but rather out of Q to t = (jyj- This implies that the self-information is reduced by a 
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TABLE II. Information-theoretic analysis of the reconstructions of Figs. [T]and[5J with iV and Ni'. 
total number of pixels and number of black pixels; A/51: fractional information content of the 
one-point statistics; E(l) / {E): roughness metric, calculated from S^fV) alone; A/52: fractional 
information content of the two-point statistics; A/51 + A/52: total information available for the 
reconstruction. 



Microstructure 


N 


Nj 


A/51 


E(1)/(E) 


A/52 


A/51 + A/52 


Single disk 


1024 


200 


0.29 


2.27 10" 4 


0.81 


1.10 


Hard disks 


1024 


200 


0.29 


6.38 10" 4 


0.48 


0.77 


Poisson point process 


1024 


200 


0.29 


1.33 10~ 2 


0.10 


0.39 


Crystal 


16384 


3000 


0.31 


1.79 10~ 6 


0.61 


0.93 


Polycrystal 1 


16384 


3000 


0.31 


6.33 10~ 6 


0.32 


0.63 


Polycrystal 2 


16384 


3000 


0.31 


1.36 10~ 5 


0.22 


0.53 


quantity 


















A/ 5 1 = 


iV-log 2 (O tot ) , 




(39) 



which can be thought of as the amount of information contained in specifying the value 
of Ni, compared to knowing merely the system size. We refer to A/51 as the information 
content of the one-point statistics. The quantity A/51 depends on the particular value of 
Ni. It is equal to N for JVi = and for N\ = N. In both cases specifying JVi is a complete 
description of the microstructure, since in those cases the system is either completely black 
or completely white. By contrast, A/51 is minimum for N\ = N/2, because this particular 
value maximizes f2 toi . 

The same reasoning can be applied to quantify the information content of S^r). Once 
S2 (r) is given, a particular microstructure is drawn out of Qq possibilities, and no longer 
Qtot- This suggests defining the quantity 

A/52 = log 2 (fit*) - log 2 (fio) (40) 

to measure the information content of S2 (r), in addition to knowing N\. 

Note that the information content of the one-point and two-point statistics can be un- 
derstood in terms of configurational entropies corresponding to different definitions of the 
macrostate of the system. In the case of the one-point information, the macrostate is spec- 
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ified via the value of N±, which results in an entropy log 2 (fifot)- Similarly, if ^(r) is used 
to define the macrostate, the entropy becomes log 2 (f2o). The information content A/52 is 
equal to the entropy reduction that results from incorporating S^r) in the definition of the 
macrostate of the system. 

The analysis of Sec. IV suggests that A/52 can be accurately calculated from E(l)/ (E) 
alone. This is the significance of Fig. [13j The inset of the figure shows that the dependency 
is a power-law of the type 



where the numerical coefficients have been obtained by a least-square fit. It has to be 
stressed here that the roughness metric E(l)/ (E) is calculated from 5*2 (r) alone. Therefore, 
Eq. (JJTJ) provides a practical means to estimate A/52 in any experimental context where 
the only information about the system is its correlation function 5*2 (r). 

For a reconstruction to be accurate, the total information available in the form of one- 
point and two-point statistics, has to be N bits. Therefore, the utility of A/51 and A/52 is 
best illustrated by normalizing them by N and defining the following fractional information 
contents A/51 = AIsi/N and A/52 = A/52/iV, which take values between and 1. A 
reconstruction is accurate whenever the sum A/51 + A/52 is close to one. Table HT1 analyzes 
the reconstruction examples of Figs. [1] and [5] along these directions. 

The three microstructures of Fig. [1] all have N\ = 200 black pixels on a grid with a 
total of N = 1024 pixels. In this case, one estimates through Eq. (139]) that the one-point 
information is A/51 ~ 0.29. The two-point information A/52 for the various microstructures 
was calculated from the roughness metric E(l)/ (E) through Eq. ( T4T1) and the corresponding 
values of A/52 are reported in Table II. In the case of the single disk, the total information 
content of S\ and S2 is close to 1, which means that a perfect reconstruction is possible. 
The fact that the value is slightly larger than 1 results from the limited accuracy of Eq. 
(jUJ). In the case of the Poisson point process, the total information available amounts to 
only 39% of the information required for the reconstruction: the reconstruction is therefore 
impossible. 

The case of the hard disks is not so clear-cut: the reconstruction captures many structural 
characteristics of the target (Fig. [T|) although only 77% of the information is available (Table 
HT|) . This seems to suggest that a fair reconstruction may be possible with about 20% of 




-0.51 



(41) 
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missing structural information. 

The information-theoretic analysis of the polycrystal reconstructions of Fig. [5] proceeds 
in the same way. In the case of the single crystal, 93% of the information is (see Table HT|) . 
which is consistent with the good quality of the reconstruction. For decreasing crystallite 
sizes, the amount of information decreases. In the case of the smallest crystallites, the 
amount of information is only about 50 % and the reconstruction is expectedly inaccurate. 

The consistency of the information analysis of the single disk reconstruction was expected 
because Fig. [13] and Eq. ( )4T|) are based on disks of various sizes. The validity of Eq. ( )4T|) 
for microstructures other than disks is established only for very small systems, for which 
the MC algorithm could be used (Fig. IT2l) . The fact the present analysis enables us to 
predict the non-degeneracy of the crystal reconstruction strongly supports the generality of 
Eq. dHJ. 



VI. DISCUSSION AND CONCLUSIONS 



Throughout the paper, we have discussed several cases of trivially and non-trivially de- 
generate microstructures. We have argued that the geometrical features that contribute to 
decreasing the non-trivial contribution to f2 are those that lead to extremal values of S^r) 
for given values of r. This is notably the case for a single disk under periodic boundary 
conditions, which is the microstructure that maximizes S^r) for sufficiently small r's. The 
crystal configuration shown in Fig. [5] is non-degenerate for similar reasons: because of its 
anisotropy, that microstructure achieves extremal values of £2 (r) for many different values 
of r. The opposite situation is that of a Poisson point processes, for which S^r) takes values 
close to the average value 2 for all r > 0. This leads to a huge degeneracy (configurational 
entropy) because none of the values of £2 (r) is close to being extremal. 

It is interesting to note that the single disk within a periodic box can be thought of as a 
dilute distribution of disks in all of Euclidean space when the infinite number of periodically 
replicated cells are considered. In the case of impenetrable disks at arbitrary density, it is 
noteworthy that ^(r) can be exactly expressed in terms of a one-body or low-density term 
(which contains the same shape and surface area information as in the dilute regime) and a 
single higher-order two-body term involving pair statistics (3, S|- ^ * s therefore the latter 
term that is responsible for the degeneracy of such configurations for arbitrary densities. 
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The trivial contribution to the degeneracy f2o depends on the particular rotational sym- 
metry and chirality of the microstructure, but it is always of the order of total number of 
pixels in the grid N. By contrast, the non-trivial contribution to the degeneracy can be 
significantly larger. The Monte Carlo estimation of f2 shows that even a modestly sized 
8x8 Poisson point process can have a degeneracy as large as Qq ~ 10 7 (see Fig. H]). This 
value is expected to increase exponentially with the size of the system because any possible 
^-preserving pixel displacement contributes multiplicatively to flo- 

In order to quantitatively address the question of the degeneracy corresponding to any 
specified correlation function, we have mapped it to the determination of a ground-state 
degeneracy. This mapping led us to two breakthroughs. First, we now can calculate for the 
first time the density of states for reconstruction problems via a Monte Carlo algorithm, 
and in particular to determine the values of Qq for a few benchmark systems. Second, we 
built on the general observation throughout physics that large ground-state degeneracies are 
generally associated with rough energy landscapes, which enabled us to use the roughness 
of the energy landscape as a proxy for the microstructural degeneracy. 

A natural metric for the roughness of the energy landscape is the total curvature of the 
energy surface, evaluated at the ground-states. Using a random walk in configuration space 
(see Fig. [7]), we derived an analytic expression for the total energy-surface curvature in the 
form of E(l)/ (E), which can be calculated in terms of S^r) alone. The Monte Carlo analysis 
confirms that E(l)/ (E) is indeed highly correlated with the degeneracy of a reconstruction 
problem, independently of the type or microstructure considered (Fig. [12]). It has to be 
noted that the roughness metric is consistent with the intuitive analysis of degeneracy in 
terms of extremal values of S2 (r). Indeed, the main contribution to the denominator (E) is 
^X 2 (r), so that any value of S^r) larger or smaller than <fi 2 contributes to decreasing the 
roughness metric, and hence the degeneracy. 

A counterintuitive result of the present study is that the distinction between trivial and 
non-trivial degeneracy is irrelevant in configuration space C. In particular, the quantitative 
relationship found between Q and the roughness metric does not discriminate the two 
types of degeneracy. This enabled us to use trivially-degenerate microstructures to generate 
a single calibration relation for Qq as a function of E(l)/ (E). That relation applies to a 
large variety of microstructures of sizes much larger than those analyzable by the Monte 
Carlo method (see Fig. [T3]) . 
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We should point out that although the examples discussed in the present work are all 
two-dimensional microstructures, the same methodology can be applied in any space dimen- 
sion. It is indeed noteworthy that Eq. ( |24p and the approximation Eq. (I36p are valid in 
any space dimension. As a consequence, the roughness metric of any higher-dimensional 
microstructure can be calculated easily from its correlation function S% (r) alone. More- 
over, the observation that we make that the relation between the roughness metric and the 
ground-state degeneracy does not discriminate trivial from non-trivial degeneracy is also 
expected to hold in any space dimension. Therefore, higher-dimensional trivially-degenerate 
microstructures (e.g. hyperspheres) can be used to produce a relation equivalent to Eq. ( )4"Tj) 
or Fig. [T3] in any space dimension. 

It is also noteworthy that our analytical results do not assume Euclidean space: the only 
restriction is that Y2j D r (i,j) should be independent of i, where D r (i,j) is the operator used 
to define S2 (r) through Eq. ([3]). Therefore, the mathematical expression of the roughness 
metric is valid in hyperbolic and spherical spaces as well as in any dimension. However, the 
relationship between the degeneracy (configurational entropy) and the roughness metric is 
expected to be space- and dimension-dependent. 

The use of information-theoretic concepts allows our methods to be easily applied in 
practice. As mentioned in the introduction, two-point correlation functions are often the 
only data available experimentally for in situ studies with a nanometer resolution, notably 
through small-angle scattering measurements 16l4l8|. T he question of the structural ambi- 



guity of small-angle scattering patterns is an old one 



of very intense X-ray sources 



19 
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671 ] , but the recent development 



681 ] has ignited a very lively debate about the possibility of 



reconstructing nanometer-scale objects from scattering patterns |69] . Our analysis provides 
a novel and very general approach to address this type of question: An accurate recon- 
struction is possible whenever the amount of information A/5! + A/52 is close to one. The 
examples that we have discussed suggest that a relatively accurate reconstruction is possible 
with up to 20% missing information, but it is premature to formulate any general rule. 

It has to be stressed that, although the present work is based on the reconstruction 
of microstructures defined on a grid with exact distances, the results apply unchanged to 
the discrete reconstruction of microstructures starting from experimental (i.e., continuous) 
correlation functions. Correlation functions of the type of the monocrystal (Fig. [5]) are 
unrealistic in an experimental context. However, the general relation between E(l)/ (E) and 
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A/52 still holds. The only difference is that experimental correlation functions of disordered 
systems are generally of the polycrystal type, with very small crystallites. Except in some 
exceptional cases, it is therefore expected that experimental correlation functions with no 
orientation information should be highly degenerate. 

The domain of applications of our results is not limited to scattering. Other applications 
can notably be found in the field of computer vision for texture recognition. A texture with 
low degeneracy flo can in principle be discriminated robustly based on two-point statistics 
alone, which would make slower three-point characterizations unnecessary 70]. 

Besides applications, information-theoretic concepts are also useful conceptually. It is 
very natural that a reconstruction be possible whenever the information content of the 
available data is equal to the amount of information required, i.e. N bits where N is the 
total number of pixels. In the cases we considered, the information came in the form of 
one-point statistics A/51 and of two-point statistics A/52. However, the approach could 
be generalized naturally to higher-order statistics . Quite generally, a successful 

reconstruction would require all correlations to be considered up to the mth order, with m 
satisfying 



N 



(42) 



n=l 



where A/s n is the information contained in n-point correlation function S n in addition to 

£>n-l- 

There is some evidence supporting the view that S3 does not contain significant infor- 



mation in addition to S2 22|, l39j. In the present context this suggests that the series in 
Eq. ( 142 jl converges slowly. The approach could be further generalized to other types of sta- 

RE]: 



tistical descriptors including lineal statistics [74j, |75( , pore-size functions 



correlation functions 



39 



66|. 



7J, and cluster 



We stress that the present work has numerous ramifications in materials sciences and 
beyond. For instance, an important question concerns the realizability of two-point correla- 



tion functions 



76 



771 ] . It would be interesting to explore whether new necessary conditions 



for the realizability of S2 (r) can be obtained by expressing that the information content (in 
bits) cannot exceed the total number of pixels in the microstructure. 

Last but not least, other applications can be found in physics. Indeed, the Hamiltonian 
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of any system with pairwise additive energy can be written as 

H = ^v(r)S 2 (r) 



(43) 



where v (r) is the pair interaction potential. It results from Eq. (143j) that systems with 
identical S2 (r) necessarily have the same energy. Therefore the degeneracy Qo calculated 
from S*2(r) is a lower bound for the physical ground-state degeneracy of any system with 
pairwise interaction energy. This includes systems such as frustrated Ising models for which 
the ground-state degeneracy is not trivia 
that of quasi-crystalline microstructures 



Another fascinating field of application is 
79| . the degeneracy of which could be analyzed 



with the general results obtained in the present study. 

^From a methodological point of view, the general approach we have developed may be 
valuable in the manifold of fields where complex energy landscapes have to be character- 
ized. These include protein folding 80|, complex chemical reactions 60|, phase equilibria in 



disordered porous materials 



82| , and glass transitions [61] . We hope to investigate some 



of these aspects in future work. 
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